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Abstract 

Background: Understanding tine basis for volatile organic compound (VOC) biosynthesis and regulation is of great 
innportance for the genetic improvement of fruit flavor. Lactones constitute an essential group of fatty acid-derived 
VOCs conferring peach-like aroma to a number of fruits including peach, plum, pineapple and strawberry. Early 
studies on lactone biosynthesis suggest that several enzymatic pathways could be responsible for the diversity of 
lactones, but detailed information on them remained elusive. In this study, we have integrated genetic mapping 
and genome-wide transcriptome analysis to investigate the molecular basis of natural variation in y-decalactone 
content in strawberry fruit. 

Results: As a result, the fatty acid desaturase FoFADl was identified as the gene underlying the locus at LGIII-2 that 
controls y-decalactone production in ripening fruit. The FaFADl gene is specifically expressed in ripe fruits and its 
expression fully correlates with the presence of y-decalactone in all 95 individuals of the mapping population. In 
addition, we show that the level of expression of FaFAHl, with similarity to cytochrome p450 hydroxylases, significantly 
correlates with the content of y-decalactone in the mapping population. The analysis of expression quantitative trait 
loci (eQTL) suggests that the product of this gene also has a regulatory role in the biosynthetic pathway of lactones. 

Conclusions: Altogether, this study provides mechanistic information of how the production of y-decalactone is 
naturally controlled in strawberry, and proposes enzymatic activities necessary for the formation of this VOC in plants. 
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Background 

The flavor and aroma of strawberries {Fragaria x ananassa) 
arise from a specific combination of sugars, acids and 
volatfle organic compounds (VOCs) that varies widely 
among different cultivars and Fragaria species [1]. 
More than 360 VOCs have been detected in strawberry, 
including esters, aldehydes, ketones, alcohols, terpenes, 
furanones, and sulfur compounds [2-6]. Lactones con- 
stitute a group of fatty acid-derived flavor molecules, 
which have y-(4-) or 6- (5-) -lactone structures, and have 
been isolated from bacterial, plants and animal sources 
[7,8]. Fruits are considered as a particularly rich source 
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of lactones, conferring peach-like aroma and flavor in 
order to attract feeders for seed dispersal [9,10]. During 
strawberry maturation, the levels of compounds defined as 
green-volatiles decrease whereas levels of flavor compounds 
characteristic of ripe fruits, including esters and lactones, 
increase in parallel to other ripening-regulated processes 
such as anthocyanin accumulation [4]. Up to 10 different 
lactones have been identified in strawberry [1,3,11] and, 
among them, y-decalactone is the most abundant, 
reaching maximum levels in fully red ripe fruits [4,12]. 

Lactones containing 8-12 carbon atoms are very potent 
flavor constituents in a variety of fruits such as strawberry, 
pineapple and peach. Biosynthetic studies indicate that 
several pathways originating from |3-oxidation of unsatur- 
ated fatty acids are responsible for the structural diversity 
of lactones [9,10,13]. All lactones originate from their 
corresponding 4- or 5 -hydroxy carboxylic acids, although 
the precise mechanism by which these substrates are 
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produced remains elusive [9]. However, four different 
mechanisms have been proposed, suggesting that the 
oxygen atom could be introduced by either (1) reduc- 
tion of oxo acids by NAD-linked reductases, (2) hydra- 
tion of unsaturated fatty acids, (3) epoxidation and 
hydrolysis of unsaturated fatty acids, or (4) reduction of 
hydroperoxides [9,14]. To the best our knowledge, the 
enzymes specifically involved in the formation of lac- 
tones have not yet been reported, however, candidate 
enzymatic activities, such as acyl-CoA dehydrogenase, 
which is the first enzyme in fatty acid |3-oxidation, have 
been proposed to be important for lactone production 
[15]. Epoxide hydrolases have been associated to y- 
dodecalactone biosynthesis in peach, implying that the 
synthesis of this lactone could proceed through epoxi- 
dation of unsaturated fatty acids [10]. Alternatively, the 
hydroxylation of unsaturated fatty acids could involve 
desaturases and cytochromes P450 (CYP) or other 
hydroxylases not related to CYPs [14]. 

The concentration of lactones in peach is controlled 
by multiple loci with quantitative effects, quantitative 
trait loci (QTL), hampering the identification of the gen- 
etic determinants controlling their biosynthesis or regu- 
lation [16]. In contrast, the content of y-decalactone has 
been shown to be controlled by one dominant locus in 
strawberry and, consequently, a number of strawberry 
cultivars lacking y-decalactone have been reported [5,6,17]. 
We have previously reported that y-decalactone is pro- 
duced in the parental line 1392' but not in '232' of a straw- 
berry mapping population and that the segregation of their 
Fl progeny matched a Mendelian 1:1 ratio with the locus 
controlling this trait mapped to the bottom arm of LG 
III-2 [6]. In this context, this population represents a 
valuable tool to identify the gene(s) responsible for the 
natural variation of this VOC in strawberry, and to pro- 
vide novel information about its biosynthesis in plants. 

Although cultivated strawberry is an octoploid (2n = 
8x = 56), and as such, four subgenomes are present in its 
complex genome, most loci show a disomic segregation. 
Furthermore its genome has a high level of conservation 
with the model species Fragaria vesca (2n = 2x = 14), in- 
cluding an almost complete synteny and high colinearity 
[18-22]. Thus, the available genome sequence of the 
diploid F, vesca can be used as a reference for genomic 
and genetic studies within the genus [23]. 

RNA-seq is replacing other methods of quantifying 
transcript expression, including microarray platforms 
[24], as it overcomes some of their limitations, such as 
detection of only those transcripts that are represented 
on microarrays, low dynamic range (limited upper and 
lower limits of detection), and thus provides more accur- 
ate quantification of differential transcript expression. A 
clear advantage of RNA-seq is the detection of novel 
non-annotated transcripts and, most relevant for highly 



heterozygous plants and polyploids such as F. x ananassa, 
the detection of the different alleles and homoeologous 
genes within their genomes [24,25]. In this report, we have 
combined genome-wide RNA-seq analysis to a bulk segre- 
gant approach to identif)^ a gene controlling y-decalactone 
content in strawberry. Additional candidate genes of the 
biosynthetic pathway of lactones are also reported based 
on this genome-wide analysis. All together, this study 
provides information of how the content of y-decalactone 
is naturally controlled in strawberry fruit and proposes 
enzymatic activities necessary for the formation of this 
VOC in plants. 

Methods 

Plant material 

The '232' x 1392' Fl mapping population, comprising 95 
progeny lines, was used in this study. This population is 
derived from the cross between selection lines '232' and 
1392' and is described in detail in [19]. '232' is a very 
productive strawberry {Fragaria x ananassa) line, whereas 
'1392' has firmer and tastier fruits [6,19]. The mapping 
population was grown in the strawberry-producing area of 
Huelva (Spain) under commercial conditions during the 
2011/2012 season. Six plants of each line were vegetatively 
propagated and grown. Ripe fruits (10-15) were collected 
the same day from the six plants of each line, divided 
into three biological replicates and independently grinded 
in liquid nitrogen. Samples were stored at -80°C until 
further analysis. 

RNA isolation and RNA-seq from pooled samples 

Equivalent amounts of ripe fruit tissue from 10 y- 
decalactone producing and non-producing progeny lines 
(Table 1) were collected in triplicate and separately pooled 
for RNA extraction. The three biological replicates of 

Table 1 Relative concentration of y-decalactone in fruits 
of selected progeny lines producing and not producing 
this compound 

Fruit samples with v-decaiactone Fruits w/o v-decaiactone 



Line 


2007 


2008 


2009 


Line 


2007 


2008 


2009 


93-01 


4.456 


1.287 


2.849 


93-03 


0.000 


0.019 


0.014 


93-12 


3.114 


1.185 


1.875 


93-07 


0.002 


0.001 


0.005 


93-19 


3.101 


4.300 


2.865 


93-14 


0.010 


0.007 


0.019 


93-36 


3.022 


3.884 


3.182 


93-17 


0.001 


0.003 


0.002 


93-43 


4.237 


3.394 


3.074 


93-18 


0.001 


0.008 


0.006 


93-54 


3.675 


3.376 


3.902 


93-49 


0.009 


0.007 


0.010 


93-61 


2.403 


1.511 


3.293 


93-68 


0.005 


0.001 


0.040 


93-64 


3.413 


1.117 


2.878 


93-69 


0.000 


0.004 


0.007 


93-78 


1.926 


1.171 


2.644 


93-80 


0.005 


0.014 


0.013 


93-92 


2.094 


1.457 


2.658 


93-89 


0.002 


0.004 


0.008 



Content in each line is expressed as a ratio relative to the content of y-decalactone 
in a reference sample containing a mix of all mapping lines for each year. 



Sanchez-Sevilla et at. BMC Genomics 2014, 15:218 
http://www.bionnedcentral.conn/1 471 -21 64/1 5/218 



Page 3 of 1 5 



each bulked pool were named H y-DECl-3 and N y- 
DECl-3 (for High and No y-decalactone pool, respect- 
ively) and used in the analysis. Total RNA was extracted 
from pooled strawberry fruits using a differential 2- 
butoxyethanol precipitation-based method [26]. Prior to 
reverse transcription, RNA was treated with DNase I 
(Fermentas) to remove contaminating genomic DNA. 
RNA quantity and quality were determined based on 
absorbance ratios at 260 nm/280 nm and 260 nm/230 nm 
using a Nanodrop. RNA integrity was confirmed by the 
appearance of ribosomal RNA bands and lack of deg- 
radation products after separation in agarose gel elec- 
trophoresis and ethidium bromide staining. The integrity 
of the RNA samples was further verified using the 2100 
Bioanalyzer (Agilent, Folsom, CA) and RIN values ranged 
between 7.2 and 7.4 for the six samples. 

For each of the 6 (2 bulks with 3 biological replicates) 
samples, one paired-end library with approximately 
300 bp insert size was prepared using an in-house opti- 
mized lUumina protocol at the Centro Nacional de 
Analisis Genomico (CNAG) facilities. Libraries were se- 
quenced on Illumina HiSeq2000 lanes using 2 x 100 bp 
reads. More than 30 million reads were generated for 
each sample. Primary analysis of the data included base 
calling and quality control, with an assurance that >80% of 
all bases passing filter had a quality value of at least 30. 

Mapping RNA-seq reads to the reference genome and 
generation of read counts 

Raw RNA-seq reads were processed to remove low-quality 
nucleotides and aligned to the Fragaria vesca reference 
genome (vl.l) and CDS (vl.O) [23] using the program 
TopHat v2.0.6 [27]. Default parameters of TopHat were 
used, allowing 40 multiple alignments per read and a 
maximum of 2 mismatches when mapping reads to the 
reference. The mapping results were then used to iden- 
tify "islands" of expression, which can be interpreted as 
potential exons. TopHat builds a database of potential 
splice junctions and confirms these by comparing the 
previously unmapped reads against the database of puta- 
tive junctions. 

The aligned read files were processed by Cufflinks 
v2.0.2 [28]. Reads were assembled into transcripts, their 
abundance estimated, and tests for differential expression 
and regulation between the samples were performed. Cuf- 
flinks does not make use of existing gene annotations 
during assembly of transcripts, but rather assembles a 
minimum set of transcripts that best describe the reads in 
the dataset. This approach allows Cufflinks to identify al- 
ternative transcription and splicing that are not described 
by pre-existing gene models [28]. The normalized RNA- 
seq fragment counts were used to measure the relative 
abundances of transcripts expressed as Fragments Per 
Kilobase of exon per Million fragments mapped (FPKM). 



Confidence intervals for FPKM estimates were calculated 
using a Bayesian inference method [28]. 

Comparison to reference annotation and differential 
expression analysis 

Once all short read sequences were assembled with 
Cufflinks, the output GTF files were sent to Cuffcompare 
along with a reference GTF annotation file, downloaded 
from Genome Database for Rosaceae (GDR) database 
{Fragaria vesca Whole Genome vl.l Assembly & Annotation. 
http://www.rosaceae.org/). This classified each transcript 
as known or novel. Cuffcompare produced a combined, 
GTF file which was passed to Cuffdiff along with the 
original alignment (.SAM) files produced by TopHat to 
identify differentially expressed transcripts between the 
two pools. The Cuffdiff algorithm then re-estimated the 
abundance of transcripts listed in the GTF file using 
alignments from the SAM file, and concurrently tested 
for differential expression between the high y-decalactone 
and the no y-decalactone pools using a rigorous statistical 
analysis [28]. The significance scores were corrected for 
multiple testing using the Benjamini-Hochberg correction. 
The expression testing is done at the level of transcripts, 
primary transcripts and genes. By tracking changes in 
the relative abundance of transcripts with a common 
transcription start site, Cuffdiff can also identify changes 
in splicing. 

Visualization of mapped reads 

Mapping results were visualized using a local copy of 
the Integrative Genomics Viewer software available at 
http://www.broadinstitute.org/igv/. Views of individual 
genes were generated by uploading TopHat-generated 
files containing the sequence alignment data (.bam files) 
to the genome browser. 

Functional analysis of gene lists using BLAST2G0 

The BLAST2GO v 2.4 suite was used for functional an- 
notation of sequences, data mining and gene set enrich- 
ment analysis [29]. The functional clustering tool was 
used to look for functional enrichment for genes over- 
and under-expressed more than two-fold between the 
pools. GO enrichment was derived with Fishers exact 
test and a cutoff of false discovery rate < 0.05 using the 
F, vesca genome annotation as reference background. A 
unique list of gene symbols was uploaded via the web 
interface. Gene Ontology Biological Process was selected 
as the functional annotation category for this analysis. 

De novo assembly of Fragaria x ananassa RNA-seq reads 

Since the current F. vesca genome sequence and the 
gene model is still a draft, some RNA-seq transcript 
sequences appeared truncated. Therefore, we proceeded 
to a de-novo assembly of the reads corresponding to the 
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high- and no-y-decalactone pools to obtain the full- 
length transcripts expressed in F, x ananassa using 
Trinity [30]. The transcript contigs most similar to the 
F, vesca candidate genes were identified by mean of 
blast search. 

Sequence analysis 

Multiple sequence alignment was carried out with 
CLUSTALW at the default settings. Phylogenetic ana- 
lyses were conducted using the neighbor-joining algo- 
rithm and Poisson model in MEGA version 5 [31]. 
Protein targeting predictions were done using WoLF 
PSORT analysis (http://wolfpsort.org) and transmem- 
brane domain search with the TMpred program (http:// 
www.ch.embnet.org/software/TMPRED_form.html). 

Real time qRT-PCR analysis 

Total RNA was extracted from strawberry tissues as de- 
scribed previously for the RNA-seq experiment. First- 
strand cDNA was synthesized from 1 (ig of total RNA 
using the iScript cDNA Synthesis kit (Bio-Rad) accord- 
ing to the manufacturers instructions. Gene expression 
was analyzed by quantitative real time polymerase chain 
reaction (qRT-PCR) using the fluorescent intercalating 
dye SYBRGreen I in an iQ5 real-time PGR detection sys- 
tem (Bio-Rad). Three biological replicates for each line 
and three independent synthesis of cDNA for each RNA 
sample were used for qRT-PCR. Relative quantification of 
the expression levels for the target genes was performed 
using the comparative Ct method [32]. Glyceraldehyde- 
3-phosphate dehydrogenase gene {GAPDH) was used as 
normalizing gene [33]. Primers are described in Additional 
file 1: Table SI. 

QTL and expression quantitative trait loci (eQTL) analysis 

QTL analyses were performed using MapQTL 5 as pre- 
viously described [6]. The raw relative data was analyzed 
first by the nonparametric Kruskal-Wallis rank-sum test. 
A stringent significance level of P = 0.005 was used as 
threshold. Second, the integrated genetic linkage map 
and transformed data sets for most traits were used to 
identif)^ and locate QTLs using Interval Mapping. Sig- 
nificance LOD thresholds were estimated with a 1,000- 
permutation test for each trait and QTLs with LOD 
scores greater than the genome-wide threshold at 95% 
were declared significant. 

Results 

In order to increase the precision of the '232' x '1392' 
map and identify new markers closely linked to the locus 
controlling y-decalactone, we first saturated the previous 
map with DArT-derived SNP markers and developed a 
saturated map. The octoploid strawberry homoeology 
group (HG) III, where the locus was previously mapped 



[6], is presented in Figure 1 in comparison to the F. vesca 
pseudochromosome 3 (fvesca_vl.l_pseudo.fna). The 4 
homoelogous linkage groups (LGs), with lengths of 
79.4, 106.2, 102 and 88.8 cM, could now be identified 
instead of 7 shorter LGs in the previous integrated map 
[6]. The average marker density in HG III was increased 
to 0.87 cM/marker and the largest gap ranged from 
5.3 cM for LG III-l to 7.7 cM for LG III-3 (Figure 1; 
Additional file 2). The locus controlling y-decalactone 
was fine-mapped to the bottom of Fragaria x ananassa LG 
III-2, closely linked to markers BFACT-45 and ChFvM140, 
consistent with our previous data [6]. In addition, six 
new SNP markers were mapped in the ± 3 cM interval 
to the y-decalactone locus (Figure 1; Additional file 2). 

In order to identify the determinants of the variation 
in y-decalactone content in strawberry fruit, we aimed 
to identify differentially expressed genes between pools 
of fruits from lines contrasting in y-decalactone content 
in the '232' x '1392' population using RNA-seq. Later, 
differentially expressed genes would be analyzed for their 
mapping position. Those genes convening the two con- 
ditions, i.e., highly expressed in fruits of high y- 
decalactone lines and located within the QTL interval 
would be considered for further analysis. RNA was ex- 
tracted from bulked pools of ripe fruits from 10 progeny 
lines with high y-decalactone content and from 10 lines 
not producing the volatile (Table 1) and used in biological 
triplicate for lUumina RNA sequencing. An alignment 
of sequencing reads was performed using the reference 
Fragaria vesca Whole Genome (vl.l) and annotation 
(CDS vl.O) [[23]; Genome Database for Rosaceae (GDR), 
www.rosaceae.org] using TopHat [27]. Over 218 million 
reads 100 bp-long were generated and after removal of 
adaptor sequences and low-quality reads, 211.6 million 
clean reads remained (97% of the raw data). Between 
68.7% and 70% of reads were paired for each of the 6 sam- 
ples and an average of 68.2% of filtered paired reads were 
further mapped to the F, vesca genome. Some key metrics 
that allowed the assessment of the quality of mapping 
reads to the reference genome were extracted from the 
TopHat output and log files, and are shown in Additional 
file 1: Table S2. 

After mapping the RNA-seq reads to the reference 
genome, transcripts were assembled and their relative 
abundances calculated using Cufflinks [28]. Genes with 
normalized reads lower than 0.1 fragments per kilobase 
of exon per million fragments (FPKM) were considered 
as not expressed. A total of 33,458 gene/transcripts from 
the two F, X ananassa pools were predicted based in the 
reference model and 19,833 and 19,720 were expressed 
in the ripe fruits of the high y-decalactone and the no 
y-decalactone pools, respectively. 

Differential gene expression (DGE) between the high 
y-decalactone and the no y-decalactone pools was 
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LGIII-1 



LGIII-2 



LGIII-3 



LGIII-4 



FV G-lll 



- BFACT-036 

- ChFaM-009 



F727-47:C>T-LG3 
1 8375-53: A>C-LG3 
M32288-65:A>C-LG3 
16164-49:T>A-LG3 
F28198-22;G>C-LG3 



F15323-37:T>C-LG3 
17951 -41 ;T>C-LG3 
F32372-16 T>C-LG7 
-16215-36 T>C-LG3 




F35845-52 G>C-LG3 
29635-56 G>A-LG3 
F35071-60A>G-LG3 
M 171 26-60 G>A-LG3 
F12311-68 G>T-LG3 
F39711-)3 A>C- 



F25851-64 T>C-LG3 



13892-65 G>A ... 
F451 27-21 T>C-LG3 
M11431-53:G>A-LG3- 
Ml 9389-35 G>C-LG3- 
F35077.28 A>T-LG3 
F30157-5G>A-LG4 
F12855-27 C>T-LG2 
M33766-39 OT-LG3- 



MO0 163-62 A>C-LG3- 
F 1957 1-52 A>G-LG3 
M10928-25C>G-LG3- 
ChFaM009-590/598/600 
F38745-47 G>T-LG3 



A>C-LG3 

.„ ^>T-LG3- 

l|' ChFaM098-222- 
M25732-44;A>T-LG2 
F39411-17:C>A-LG3 
M32014-34:A>G-LG2- 
M29795-41:C>A-LG2 
M11990-5 C>A-LG3 
M42834-16:C>T-LG1 
F39730-62OT-LG2 
F21459-31 A>G-LG3 



M31240-58 C>G-LG2 



A-LG2ChFaM130-2C 



F34407-22:A>G-LG3 
M32399-47 T>G-LG3 
F11317-41 A>C-LG3 
M01 326-24 A>G-LG3 



F19188-14 A>G-LG3 
F33357-16G>A- 
F28923-59 A>G-LG3 



CAA/AAC-55C84 1 53-1 5 G>T-LG3 



I M26326-32 T>C- 




■LG2M28001-55C>A 



_ _ -26A>G-LG2 
M22637-59 C>T-LG3 
M30011-30 A>G-LG3 




/tl 3530-65 T>C-LG3 



F23715-36 OT-LG4 



ChFaM214-166 



F19922-43G>A- 

UDF004-142 

M26318-66T>A-LG3 

M42010-30OA-LG3 

F40525-48 A>G-LG3 

M22246-19 T>G- 

ChFaM080-219* 

M12804-15C>T-LG3 

M12535-30C>T-LG3 

M30243-21 A>G-LG3 

F10705-19C>T-LG3 

BFACT45-175 

y-decalactone 

F 1 7378-6 1G>A-LG3 
ChFvM140-114 

M18215-45C>A-LG3 
F34813-6T>C-LG3 
- G>A-LG3 



/I16050-62 A>G-LG3 
M 5333-8 OA-LG3 



F24406-46 G>A.LG3 



17 7-1 ' 



52 3 1 , 1 / 



19791-46:C>G-LG3 
--mi5:T>G- 
7296-51 T>C-LG3 



22289-21 C>G- 
ChFvM049-170 

M38391-28G>A-LG3 
Fl 6947-6 A>G-LG3 



M26191-47 T>G-LG3 
F31079-15 G>T-LG3 
CAA/AAA-340 



371-16 T>C-LG3 
( M1 1002-60 G>A-LG3 
\111351-31 A>T-LG4 
, ^11602-48 OG-LG3 
t 16313-21 C>A-LG3 
I BFACT36-121 
I M29113-21 T>A-LG3 
I M24670-60 A>G- 
V113028-62 A>G-LG6 
^15915-27 C>T-LG2 



J 30142-29:C>T- 
M24390-15G>T 
35838-LG3 



^10981-56 C>T-LG6 
^15052-6 G>A-LG2 
36083-40 T>C- 
■■■4150-21 T>C-LG2 



8-36 A>G-LG2 
F44845-23 T>G-LG3 
M27550-28G>C-LG3 
-9154-41 C>T- 



T>C- M41100-38 G>A-LG3 



12094-6 C>G- 
F23153-22 C>T-LG3 
ChFaM159-262 
18555-22 C>T-LG3 
14214-20 G>A- M131 
12214-60 C>G-LG3 
11764-46:A>T-LG3 
1 0855-38 A>G-LG4 
35784-LG3 
F25553-34 C>T-LG2 
13259-38 G>T- 
F36538-12 T>G-LG3 
M18363-60:T>C-LG3 
34551-5 G>A-LG3 



35784-11 T>A-LG3 
12541-37 T>C- F40029-2f 
1 5653-52 A>G- 
35706- 

Fl 7220-33 A>G-LG3 
ChFvM 184-1 98 
F40542-40 A>C- 



F 1 239-1 0G>A-LG3 
11127-67 A>T-LG3 
BFACT45-162 
F35554-6G>A- 
11612-40 A>G-LG3 



F 13596-62 A>G-LG3 
F1 1 864-65 G>T-LG3 
F11907-10C>T-LG3 



3192-61 A>T-LG3 



12469-19:C>T-LG3 
:37283-12:A>G- 
;27089-47:C>T- 
37283-23:T>A- 
=aNES-S1 



-40T>A-LG3 
"22955-25:T>A-LG3 
■30696-19 A>T-LG3 
35853-6 G>A-LG6 
■34801-8 A>C-LG3 
:42124-32 T>A-LG3 
-64T>C-LG3 
"36492-25:C>T-LG3 
■28471-20 G>C-LG3 
12392-46:A>C-LG3 
■22103-46 T>G-LG3 
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Figure 1 Comparison of pseudochromosome 3 (FV G-lll) of the diploid Fragaria vesca with the four homoeologous linkage groups on 
the integrated '232' x '1392' linkage map (LGIII-1-LGIII-4) of the octoploid F. x ananassa. The v decalactone locus is highlighted in orange 
and linked SSR markers in blue. SSR and gene markers are highlighted in bold. Position of markers (in cM) is indicated on the left of the linkage 
groups. For simplicity, only the position of anchor SSR markers (in Mb) is stated on the left of the F. vesco group. 



calculated using the ratio of FPKM values of each gene in 
both pools. A total of 617 predicted genes were differen- 
tially expressed between the two pools and re-annotated 
using Blast2go [29], Of these, 403 were up-regulated and 
214 were down-regulated in the high-y-decalactone pool 
(Additional file 3). The observed ratios (log2 fold change) 
of differential expression ranged from -5.161 to 3.489, 
with negative and positive values indicating up- and 
down-regulation in the high-y-decalactone pool, respect- 
ively. Only one gene (gene24970-vl.0-hybrid) encoding 
for a predicted protein with similarity to cinnamyl alcohol 
dehydrogenase, was not expressed at all in the no-y- 
decalactone pool and expressed in the high-y-decalactone 
pool, albeit with a relatively low value of expression (3.79 
FPKM). Among the 617 differentially expressed tran- 
scripts, 577 corresponded to annotated genes in the 



F. vesca gene model v. 1.0 [23] while 40 matched with 
not annotated genome regions. Among these, 28 corre- 
sponded to predicted genes from F, vesca recently anno- 
tated in the NCBI, while the remaining 12 transcripts have 
not yet been annotated. Some gene families appeared 
over-represented in the fruits with high y-decalactone 
content such as cinnamyl alcohol dehydrogenases, with 6 
differentially expressed genes, glutathione s-transferases, 
with 7 up-regulated genes, and cytochrome p450, with 5 
up-regulated members (Additional file 3). 

Functional annotation and enrichment analysis 

In order to describe gene functions in a standard and 
controlled vocabulary, we used the Blast2G0 suite. A 
total of 3,757 gene ontology (GO) terms were assigned 
to a total of 559 differentially expressed genes, while 58 
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did not match any terms. Sequence distribution (at level 
2, filtered by a cut-off of 60 sequences) for biological 
processes, molecular functions and cellular component 
are summarized in Additional file 1: Figure SI. Within 
biological processes, the most abundant categories were 
metabolic process (380 sequences or 27%), cellular 
process (350 sequences; 25%) and response to stimulus 
(210 sequences; 15%). The most represented molecular 
function was catalytic activity (299 sequences; 49%) and 
a large proportion of sequences (177; 22%) were associ- 
ated with membrane as cellular compartment. 

To investigate the biological processes associated with 
differences in y-decalactone content, a GO enrichment 
analysis was performed using Fishers exact test using 
the sets of up-regulated and down-regulated transcripts 
separately in comparison to those in the reference F, 
vesca gene model. A total of 51 biological processes 
were significantly enriched for the genes up-regulated 
in fruits with high y-decalactone content (Additional 
file 1: Table S3). Most of these 51 common ontologies 
are 'descendants' of 5 higher hierarchical nodes in the 
GO tree: response to stimulus (GO:0050896), cellular 
aromatic compound metabolic process (GO:0006725), 
lipid homeostasis (GO:0055088), nitrogen compound 
metabolic process (GO:0006807) and organic substance 
metabolic process (GO:0071704). Within these biological 
processes, the most significantly over represented term 
was oxidation-reduction process (GO:0055114, 92 genes 
within this term). One possible interpretation is that en- 
zymes catalyzing the addition or removal of electrons are 
needed in the biosynthesis of y-decalactone in strawberry. 

The number of biological processes significantly enriched 
within the down-regulated genes (up-regulated in the 
no-y-decalactone pool) was higher, 101, and more 
diverse (Additional file 1: Table S4). The three most 
significantly over represented terms were regulation of 
biological quality (GO:0065008), response to biotic stimu- 
lus (GOO:0009607) and ion transport (GO:0006811). Glo- 
bally, these data suggest that lacking y-decalactone in 
strawberry fruits is associated to a wide range of different 
biological processes. Particularly interesting is the number 
of genes up-regulated in the categories of response to 
stimulus and biotic stress in the absence of y-decalactone. 

Identification of FaFADI as the gene underlying the locus 
controlling y-decalactone content in LG III-2 

The top 25 up-regulated genes in the pool with high 
y-decalactone content are listed in Table 2. The third 
gene with the highest up-regulation between the pools 
corresponds to the Fragaria vesca gene ID gene24414- 
vl.O-hybrid (gene24414), with homology to fatty acid 
desaturases (FAD) and in particular to the microsomal 
A- 12 oleate desaturase (FAD2). This gene shows a high 
expression in the pool of fruits with y-decalactone 



(325.92 FPKM) and its expression is '-30-fold higher (4.8 
log2-fold) than in the pool of fruits without y-decalactone 
(11.41 FPKM; Table 2; Additional file 3). Most interest- 
ingly, gene24414 maps to the end of the pseudo-molecule 
3 of the Fragaria vesca reference genome, at the exact 
position where the gene controlling y-decalactone has 
been mapped in F, x ananassa (Figure 1). In addition to 
gene24414, only 3 other genes among the 617 differen- 
tially expressed genes mapped to the y-decalactone con- 
tent locus. Two of them, gene24411 and gene24415, at 
about 34 and 9 Kb from gene24414, respectively, and a 
third, genel4386, separated by 808 Kb. However, the fold- 
change between the bulked pools for these 3 genes was 
much smaller than for gene24414, ranging from 0.5 to 0.9 
log2-fold. Two of these genes, 24411 and 14386, with 
similarity to callose synthase and glycerophosphoryl diester 
phosphodiesterase^ respectively, were down- regulated in 
the high y-decalactone pool. The third gene, 24415 with 
highest similarity to the nucleolar complex protein 2, 
showed higher expression in the high y-decalactone pool 
(Additional file 3). 

All of the reads from the high y-decalactone pool 
mapping to the gene24414-vl.0-hybrid (hereafter named 
FaFADI and FvFADl for the cultivated and wild straw- 
berry, respectively) corresponded to one unique allele, 
indicating that only one allele is expressed in fruits of 
the 10 selected siblings. Similarly, all the reads from the 
no-y-decalactone pool corresponded to the same allele 
to that in the fruits with high content. 

The predicted gene24414 in the strawberry draft gen- 
ome sequence [23] contains 5 exons. However, when 
visualized using the integrative genomics viewer (IGV), 
the reads of the RNA-seq experiments only spanned the 
first two exons indicating that FvFADl was not correctly 
annotated in the strawberry genome sequence. A differ- 
ent model of the same gene that only spans the two first 
exons is available in the NCBI under accession number 
LOC101309231, and is predicted to encode a protein of 
393 amino acids. However, the last 16 amino acids of 
the carboxylic end of the predicted protein do not have 
similarity to reported FAD2 proteins (data not shown). 
In order to unequivocally determine the Fragaria x ana- 
nassa FADl transcript, we performed a de novo assem- 
bly of the RNA-seq reads from the high y-decalactone 
pool replicates. Only one contig of 1847 bp with high 
similarity (excluding 25 bp of the 3 ' end of the ORF and 
the 3'UTR) to the F. vesca FvFADl gene was obtained, 
and contained an ORF of 1125 bp encoding a predicted 
protein of 375 amino acid residues. In this prediction, 
the C-terminus shares high similarity to FAD2 proteins 
except that FaFADI lacks the two last amino acids 
(Figure 2). However, the lack of these two amino acids 
is also found in the predicted amino acid sequence 
encoded by a F. x ananassa EST (accession number 



Table 2 List of the top 25 significantly up-regulated genes in the high y-decalactone pool compared to the No y-decalactone pool 



Gene 


Locus position 


Predicted function 


CDI/IV/I LI »• r»c/~ 

rrKIVI n Y'L'tL 


rrMVI IN Y'L'tL 


log2 fold change 


Test statistic 


p-value 


q-value 


Q7n \/1 n h\/KriH 

ytjllcfz^b'/U-V 1 .U-riyDilG 


1 r" 1 -1 A1 7';'7QQ 1A17AQ/IA 
LU 1 . 1 0 1 /J ZOO- 1 0 1 /Oy^O 


Cinnamyl alcohol dehydrogenase-like 


^ 7Q 

o./y 


o oo 
u.uu 


1 QP _L ^OQ 
- 1 .OL -h OUO 


1 QP j_ ^OQ 
- 1 .OL -h OUO 


^ 7QP 1 

O./oL- 1 J 


9 OAF 1 9 
Z.UDt- 1 Z 


9Gn6221 45-vl .0-hybrid 


1 r"/IO/l 1 7/1/17^ lA 1 7AQ7/I 
LU4.Z4 1 /44/ J-Z4 1 / Do/ 4 


Aldo keto reductase 


1 QQ 

1 .yo 


O OA 


^ 1 A 
-J. \ 0 


^ AA 
0.44 


o 000^:7^ 
U.UUUj/0 


0 091 Q71 
U.UZ 1 y / 1 


ytrl Itrz^^ 1 ^-v 1 .u-i lyiJI Id 


LU J.J 1 1 1 Z4 1 / - J 1 1 1 4040 


Microsomal delta-12 oleate desaturase 


ozj.yz 


11/11 
M .4 1 


A 9A 


AO 
OO.DU 


u 


n 
u 


ytjiitf 1 /oD \ -V 1 .u-riyDiiG 


1 r~ 1 .1 r/i;7i r lOCAI^'^l 
LU 1 . 1 Zj jD/ 1 J- 1 ZjD 1 OZ 1 


Isoflavone 2 -hydroxylase-like 


ZD.ZU 


1 .JZ 


/I 1 O 
-4. 1 U 


1 ^ QO 
1 o.oU 


n 
u 


n 
u 


y cl Ic 1 1 D 1 D V 1 .U 1 lylJI l(J 


UlldllLI lUlcU 


MA 
In A 


1 07 

1 .u/ 


0 1 0 
U. 1 U 


o.4y 


A AQ 
D.4y 


Q ^AF 1 1 
0.04L 1 1 


9 07P HQ 
Z.O / L UO 


yGDGl 2565-vl .0-hybrid 


LU/. 1 y4uyuoy- 1 y^uyooo 


S-norcoclaurine synthase-like 


O.oZ 


1 00 
1 .UU 


^ OQ 


A 1 
0. 1 J 


7 ^AF 1 0 

/ .jdl- 1 U 


1 A1 F 07 
1 .0 1 L-U/ 


y cl icu^o 1 z V 1 .u 1 lyui lu 


LvjD.^DJ I ODU yUDO 1 O/ 


Thi3i 1 m n-l i I/O nirr^foin 
llldUllldLIII IIKc piULclll 




1 1 R 
L 1 O 


o.uo 


9 

O.JO 


0 
u 


0 
U 


genezy4ou-v i .u-nyDi la 


unanchored 


Salicylic acid-binding protein 2-like 




O AO 


9 QQ 

-z.yo 


/I A9 
4.0Z 


Q Q9F OA 

o.yzt-uo 


0 OOOQA9 
U.UUUODZ 


yeneuyujy-v i .u-nyDiia 


LUz.zuy4ooOD-zuyjUzyo 


Hypothetical protein 


1 1 n 
LIU 


O 1 /I 

U. 1 4 


9 QQ 

-Z.yo 


'X QQ 

o.oy 


0 0001 09 
U.UUU 1 Uz 


0 ooc;a/i 7 
U.UUjD4/ 


ytrl It: 1 Z^OJ-V 1 .u-i lyUlKJ 


Lyj \ .DU404yD-DU4y 1 OZ 


Auxin-binding protein abpl9a-like 


1 1 
Z. 1 D 


0 19. 
U.ZO 


9 QA 

-Z.yo 


^ ^0 
O.JU 


0 OOOAAQ 
U.UUU4D0 


0 01 QQ/l c; 
U.U 1 0040 


nononQA?7 \/1 0 h\/hriH 

yciicuy^z/ V 1 .u iiyuMu 


L\3j. 1 uzyzyyo i uzyo/ou 


rIOUdUlc yiULdLlllOllc b LIdllblcldbc IIKt: 


QA 91 

y4.z 1 


1 ^ OA 
1 0.U4 


9 Qc; 
z.oo 


1 7 AQ 

1 / .4y 


n 
u 


u 


r^onolARR') \/1 H h\/hriH 

ytrllt: 1 OooZ-V 1 .u-riyL)ii(j 


1 (^/l • 1 7 A/1 Q/l 7 ^ 1 7A'^n^n/i 
LU4. 1 /D4y4/0- 1 /DJU0U4 


Probable glutathione s-transferase-like 


J. 1 0 


0 7A 
U./D 


9 1^ 
-L.I J 


^ 7A 
J./4 


Q ofiF OQ 

y.ouL-uy 


1 AOF OA 
1 .DUL-UO 


ycritrUjo/ 1 -V 1 .U-I lyuiiG 


1 r"AOQ77QQQQ IQlQOQl^A 

LUD.zo/ /yoyo-zo/oyoD4 


Beta-glucanase 


Q ^1 

y.oz 


1 /I 
1 .4j 


9 AQ 

-z.oy 


1 O 1 ^ 
1 U. 1 0 


n 
u 


n 
u 




\-\J\J.Z.DZD\JOyD Z.DZD / '-YDD 


nypULI IcLILdI piULclll 


z.oz 


0 Xl 
u.o / 


-9 
Z.O/ 


^ 9A 
O.Z^ 


0 001 91 A 
u.uu 1 Z 1 D 


0 0^RA71 
U.UOOD/ 1 


ytrl icuo^z^ V 1 .u 1 lyui lu 


Ul Idl ILI lUlcU 


r d LI lUy cl Icbib IcIdLcU piULclll ^ 


7 04 


1 1 ^ 


-9 A1 

Z.O 1 


A A7 


1 01 F-1 0 
LU 1 L 1 U 


9 7QF-0R 
A./y\L UO 




1 ri7-9iAAnin 7iaa^7a 

LVD / .Z 1 D4U 1 U Z 1 D4 J ZD 


MA 
IN A 


A 0/L 
4.U4 


0 A7 
U.O/ 


-9 AO 
Z.OU 


A OQ 

4.uy 


A 9QF 0^ 

4.zyL uj 


0 00970Q 
U.UUZ/ UO 




unanchored 


— NA— 


77.57 


13.29 


-2.55 


3.37 


0.000739 


0.026493 


genel 3265-vl .0-hybrid 


LG7:21 691 207-21 692009 


Par-la protein 


3.39 


0.59 


-2.53 


4.10 


4.17E-05 


0.002653 




LG4:2760209-2761669 


Inhibitor of trypsin and hageman factor 


4.34 


0.76 


-2.51 


3.41 


0.000656 


0.024349 


gene28799-vl. 0-hybrid 


LG3:9340369-9341925 


Nectarin-3-like 


2.84 


0.52 


-2.45 


4.74 


2.12E-06 


0.000217 


gene32603-vl .0-hybrid 


LG4:2356895-2357929 


Sieve element-occluding protein 


1.82 


0.34 


-2.40 


3.48 


0.000503 


0.019820 


gene02395-vl. 0-hybrid 


LG3:55 14062-55 181 81 


Cytochrome p450 


106.92 


20.98 


-2.35 


19.15 


0 


0 


gene21028-vl. 0-hybrid 


LG7:1 765501 7-1 7664058 


Psbp domain-containing protein chlorop-like 


1.80 


0.36 


-2.33 


4.37 


1.24E-05 


0.000952 


genel 7832-vl. 0-hybrid 


LG1:1 255671 5-1 2561 321 


Hypothetical protein 


19.76 


4.11 


-2.26 


5.07 


3.9E-07 


4.78E-05 




LG3:1 0391 258-1 0391 650 


— NA— 


14.68 


3.06 


-2.26 


3.78 


0.000156 


0.007817 



Gene number according to the Fragaria vesca genome draft (www.Strawberrygenome.org). For a number of transcripts detected by cufflinks (-), no reference transcript was predicted in the Fragaria vesca gene model. 
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Figure 2 Comparison of the amino acid sequence of strawberry FaFADI with other plant fatty acid desaturases (A12-FADs) and 
hydroxilases (A12-FAH). Identical amino acid residues were indicated with black background. Dark and bright gray shade indicated 80% and 60% 
or more conservation among all the aligned sequences, respectively. The predicted trans-membrane domains (TM-helixes) and highly conserved 
His boxes are shown. The seven amino acid residues that differ between oleate desaturases and hydroxilases according to [45] (numbering based 
on the arabidopsis FAD2 sequence) are indicated in green when the amino acid residue is conserved between FaFADI and hydroxilases and red 
when not. Accession numbers for the sequences were as follows: Frogario x ononasso FaFADI (KF887973), Nicotiono tobocum NtFAD (AAT72296), 
Olea europaea OeFAD2-l (AAW63040), Davidia involucrate DiFAD2 (ABZ05022), Crepis alpina CaFAD2-2 (ABC00770), Arabidopsis thaliana AtFAD2 
(AAA32782), Prunus persica PpFADlB-6 (AGM53489), Ricinus comunis hydroxylase RcFAH12 (AAC49010), (MDP0000288297), Lesquerella fendleri 
bifunctional hydroxylase/desaturase LFAH12 (AAC32755). 



C0381851) available in the NCBI, indicating that this 
might be the correct sequence. The full length ORF of 
FaFADI, including a 3' un-translated region, was cloned 
from parental line '1392' using primers deduced from 
the transcriptome assembly and confirmed the predicted 
sequence. In agreement to previous results using the 
reference F. vesca genome, the same FaFADI allele was 
obtained after de novo assembly of the RNA-seq reads 
from the no-y-decalactone pool. 

The deduced FaFADI protein sequence contains the 
Deltal2 Fatty Acid Desaturase (Deltal2-FADS)-like con- 
served domain (E-value: 1.74e-56). Membrane FADs are 
non-heme, iron-containing, oxygen-dependent enzymes 
involved in regioselective introduction of double bonds 
in fatty acyl aliphatic chains. These enzymes are respon- 
sible for the synthesis of 18:2 fatty acids in the endoplas- 
mic reticulum. Six putative transmembrane domains are 
predicted within FaFADI using the TMpred program as 
expected for an integral membrane protein (Figure 2). 
Alignment with other characterized FAD2 proteins in- 
dicated that the characteristic His-rich motifs, which 
contribute to the interaction with the electron donor 
cytochrome b5, were conserved in the deduced 
FaFADI protein. The most similar protein to FaFADI in 
Arabidopsis was the endoplasmid reticulum localized ole- 
ate desaturase FAD2 catalyzing the conversion of oleic 
acid (18:1) to linoleic acid (18:2) [34]. 



Highly similar proteins to FaFADI were identified after 
a blastp search in the NCBI and the phytozome database 
(www.phytozome.net). As shown in Figure 3, the phylo- 
genetic analysis indicates the presence of two major 
clusters. As expected, the FaFADI protein was most 
similar to the F, vesca predicted protein encoded by the 
gene24414, located at the bottom of chromosome 3, and 
also to a predicted FAD protein from the closely related 
Mains genus. Interestingly, this group of protein se- 
quences grouped with the Ricinus communis fatty acid 
hydroxylase RcFAH12. This protein, which shares high 
sequence similarity to desaturases, has been shown to 
catalyze the hydroxylation of oleate to produce the hy- 
droxy fatty acid ricinoleate [35]. A recently identified 
FAD from Prunus persica, PpFADlB-6, has been associ- 
ated with lactone content in peach fruits using an inte- 
grative omics' approach [14]. This protein grouped in a 
second cluster with the Ricinus communis desaturase 
RcFAD2 and other reported desaturases, such as the 
Arabidopsis AtFAD2. 

To further investigate whether the down-regulation of 
FaFADI is the cause for the extremely low y-decalactone 
content in strawberry fruits, we first validated the differen- 
tial expression observed in the pools by qRT-PCR. As 
shown in Additional file 1: Figure S2A, the expression of 
FaFADI was ~30-fold higher in the high-y-decalactone 
pool, the same differential expression obtained using 
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Figure 3 Phylogenetic tree showing protein sequence 
relationsiiips among selected FAD2 members from different 
species. Bootstrap values (%) for 1000 replicates are indicated at 
the nodes. Position of the Fragaria, Malus and Prunus amino acid 
sequences is labeled in dark red, light red and orange, respectively. 
Accession numbers are shown in parenthesis: Prunus persica PpFADl B-6 
(AGM53489), Fragaria vesca Fv_predicted FAD (LOCI 01 290788), 
Arabidopsis thaliana AtFAD2 (AAA32782), Persea americana PaFAD2 
(AAL23676), Glycine max GmFAD2-2B (BAD89863), Linum usitatissimum 
LuFAD2-2 (ACF49507), Ricinus communis RcFAD2 (ABK59093), 
Theobroma cacao Tc_predicted FAD (EOY09487), Punica granatum 
PgFAD2 (AA037754), Xanthoceras sorbifolia XsFAD2 (AGO32050), 
Olea europaea OeFAD2-l (AAW63040), Sesamum indicum SiFAD2 
(AAF80560), Solanum lycopersicum SLpredicted FAD (XP_004228665), 
Nicotiana tabacum NtFAD (AAT72296), David ia involucrate DiFAD2 
(ABZ05022), Glycine max GmFAD2-l B (BAD89861), Crepis alpina 
CaFAD2-2 (ABC00770), Ricinus comunis RcFAH12 (AAC49010), 
Malus domestica Md_predicted FAD (MDP0000288297), Fragaria x 
ananassa FaFADI (KF887973) and Fragaria vesca Fv_predicted 
FADl (LOCI 01 309231). 



RNA-seq (Table 2). Quantitative RT-PCR also validated 
the RNA-seq data for other three up-regulated genes 
(see below; Additional file 1: Figure S2). 

We next examined the expression level of the gene by 
qRT-PCR in the complete '232' x '1392' population com- 
posed of 93 siblings and the two parental lines. 51 lines 
(-54%) showed no detectable expression of FaFADI, 
with threshold cycles (Ct) similar (± 2) to the no- template 
control and relative expression levels ranging from 0.01 to 
0.58 (Figure 4). The rest of the lines showed different 
levels of expression that ranged from 7 to 247 times the 
average expression in the population (Figure 4). Further- 
more, co-segregation between high/no FaFADI transcrip- 
tion and y-decalactone content was observed (Figure 4; 
Additional file 1: Figure S4A). Strikingly, primers used for 



qRT-PCR of FaFADI failed to amplify in genomic DNA 
from each line producing fruits without y-decalactone 
(Additional file 1: Figure S3A). 

Several FAD2 genes show a seed specific expression, 
having a role in seed storage fatty acids, while others 
have an ubiquitous expression, being then involved in 
the general biosynthesis of membrane fatty acids [36]. 
This correlation between expression and function of 
other desaturases prompted us to analyze the expression 
pattern of FaFADI in order to determine whether it is 
correlated with the induction of y-decalactone produc- 
tion during the last stages of fruit ripening. The expres- 
sion in different tissues and during fruit ripening was 
analyzed by qRT-PCR in the commonly cultivated 
Chandler cultivar. In this cultivar, the expression level of 
FaFADI in red fruits was similar to that of line '1392' 
(Additional file 1: Figure S3B, C), consistent with both 
genotypes producing y-decalactone in ripe fruits. In con- 
trast, expression of FaFADI was not detected in '232' 
and 'Camarosa' by qRT-PCR, indicative of both cultivars 
not producing y-decalactone. As shown in Figure 5A, 
FaFADI increased its expression ~ 150-fold between 
white and red fruit, consistent with the biosynthesis of 
y-decalactone during the late stages of fruit ripening. 
Supporting a specific role of FaFADI in ripe fruits, no 
expression was detected in leaves, and very low ex- 
pression of FaFADI was detected in roots, green and 
white fruits. 

Collocation of QTL for v-decaiactone content and eQTL 
for candidate genes 

Based in their predicted function, we selected two add- 
itional candidate genes for further analyses within the 
top 25 highly up-regulated genes (Table 2). The fourth 
transcript in the list, gene 1 783 1-vl.O-hybrid, showed 
17-fold (4.1 log2_fold_change) higher expression in the 
high-y-decalactone pool (Table 2; Additional file 3). The 
predicted protein sequence has high similarity to the 
cytochrome P450, family 81, and contains the p450 
superfamily conserved domain (E- value 8.37e-96) and the 
PLN02183 5-hydroxylase multidomain (E-value 1.22e-70). 
The gene02395-vl.0-hybrid, at the 23rd position in 
Table 2, showed a 5-fold up-regulation and encodes for 
a predicted protein with high similarity to the cyto- 
chrome P450, family 79, subfamily A and thus, also 
contains the p450 superfamily conserved domain (E-value 
3.60e-40) and several hydroxylase domains such as 
PLN03018 (E-value 6.14e-125). The differential gene 
expression observed for these two transcripts was validated 
by qRT-PCR, obtaining a 14.8 and 6.5 fold-change in 
their expression between the pools for gene 17831 and 
gene02395, respectively (Additional file 1: Figure S2). 
Since these two genes encode for protein sequences 
with high homology to CYP hydroxylases, we further 
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Figure 4 Comparison of expression of FoFADI in the 232 x 1392 mapping population by real time qRT-PCR with v-decaiactone content 
in fruits. FaFADI expression (black bars) is expressed relative to the average expression in the population and it is indicated on the Y-axis to the 
left. The production of y-decalactone scored as presence/absence (gray dots) is indicated on the Y-axis to the right. 



investigated their possible association to y-decalactone 
biosynthesis by analyzing their expression in parental 
and all progeny lines of the '232' x '1392' mapping 
population (Additional file 1: Figure S4). A significant 
level of correlation between the transcript level of the 
F. X ananassa gene corresponding to genel7831-vl.O- 
hybrid (hereafter referred to as FaFAHl) and y-decalactone 
content was observed (Pearson correlation = 0.45). On 
the contrary, no association between the transcript level 
of gene02395-vl.0-hybrid (hereafter referred to as FaCYPl) 
and y-decalactone content was observed. Furthermore, 
expression profiling in different tissues by qRT-PCR 
showed that FaFAHl is expressed in leaf and ripe fruits 



while FaCYPl was highly expressed in leaf and to a 
much lesser extend in green fruit (Figure 5B, C). These 
results are consistent with FaFAHl but not FaCYPl 
having a possible role in y-decalactone accumulation in 
ripening strawberries. 

Transcript expression levels measured for genes in a 
mapping population allow them to be treated as traits 
for gene expression QTL (eQTL) analysis. The locations 
of eQTL that regulate gene expression can be correlated 
with those of QTL for traditional phenotypic traits and 
so provide additional clues as to the genetic basis of 
quantitative genetic variation [37]. The complete correl- 
ation observed between high or no FaFADI expression 
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Figure 5 Expression profiles of candidate genes in different 
tissues and during fruit ripening determined by real-time 
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and y-decalactone content (Figure 4; Additional file 1: 
Figure S4A) indicates that an eQTL controlling the ex- 
pression of FaFADI collocates with both the position of 
the gene and the phenotypic trait. In order to evaluate 
this prediction and to test whether the differential ex- 
pression of FaFAHl or FaCYPI could be correlated with 
the position of the locus controlling y-decalactone con- 
tent, we next used the qRT-PCR quantitative data for 
eQTL analysis for the three genes. As a positive control 
we re-analyzed the QTL for y-decalactone content [6]. 
Interestingly, QTL analysis performed with both the 
non-parametric Kruskal-Wallis test and interval map- 
ping using the integrated map of [6] resulted in the iden- 
tification of the same eQTL for both FaFADI and 
FaFAHl expression (Figure 6; Additional file 1: Table S5). 
The eQTL for FaFADI expression, as expected, collocated 
with both the position of the QTL controlling y- 
decalactone content and the position of the gene 
FaFADI at the bottom of LG III-2 and accounted for 
90% of the variation (Additional file 1: Table S5). Inter- 
estingly, an eQTL controlling 55% of the variation in the 
expression of FaFAHl was detected also in LG III-2, at 
the exact same position as where FaFADI (and the locus 
controlling y-decalactone content) is mapped. The pos- 
ition of FaFAHl is predicted to be in one LG of HG I 
based in the F. vesca genome sequence (Table 2), implying 
that the eQTL at LG III, which most lil<ely is the gene 
FaFADI, is regulating the expression of FaFAHl, When 
we analyzed the expression of FaCYPI, an eQTL was 
detected at the top of a different LG belonging to the 
same homoeology group III (Figure 6; Additional file 1: 
Table S5). The position of the eQTL for FaCYPI matches 
the position of the gene based in the F, vesca genome 
sequence (Table 2). This data is consistent with the lack 
of association previously found between the transcript 
level of FaCYPI and y-decalactone content. 

Discussion 

Two bulked pools of segregants representing the pheno- 
typic extremes within a relatively large population dis- 
playing wide variation for a given trait would only differ 
at the locus controlling the trait. Although bulk segre- 
gant analysis (BSA) has generally being used to tag genes 
controlling Mendelian traits, the method can also be 
used to identify major QTL [38]. The applicability of 
BSA to RNA-seq was recently demonstrated by mapping 
the maize mutant gene gl3 [39]. Here we report the 
combination of BSA and RNA-seq as a powerful and 
valid approach for quantifying differential transcript 
expression and for cost-efficient identification of genes 
underlying y-decalactone variation in cultivated straw- 
berry. Once we fine mapped the locus to the bottom 
of chromosome 3, the assumptions made for candidate 
genes were that (1) the genes must show low or no 
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expression in fruits without y-decalactone while in 
fruits producing this VOC had to be high and (2) the 
gene must encode for an enzyme involved in the bio- 
synthesis of this volatile, based on the proposed path- 
ways, or should encode for a regulatory protein. Out 
of the 33,458 analyzed transcripts, only gene24414 ful- 
filled both requirements. This gene encodes for a pro- 
tein, FaFADI, with extensive similarity to delta- twelve 
fatty acid desaturases, enzymes that catalyze the regio- 
selective introduction of a double bond at the A12 
position during lipid biosynthesis [34]. Therefore, the 
activity of this protein could supply fatty acid precur- 
sors for lactone biosynthesis. The sequence alignment of 
FaFADI with other desaturases revealed the presence of 
three conserved histidine boxes reported to be essential 
for the catalysis, and proposed to be the ligands for the 
iron atoms involved in the formation of the di-iron- 
oxygen complex. Interestingly, the deduced FaFADI 
protein is shorter than the rest of FAD proteins and 
neither the dilysine nor the aromatic amino acid- 



enriched retrieval signal (-YKNKF) are present at the 
C-terminus of FaFADI (Figure 2). One of these motifs 
is necessary for maintaining localization of the enzymes 
in the endoplasmic reticulum (ER) [40]. However, a 
PSORT algorithm (http://wolfpsort.org) predicts that 
FaFADI is targeted to the ER with a certainty of 8.0, 
consistent with the six transmembrane domains pre- 
dicted for FaFADI. 

In addition to a desaturase activity, a number of FAD2 
variants are known to possess diversified functionalities, 
catalyzing hydroxylations, epoxidations, or the formation 
of acetylenic and conjugated double bonds [35,41,42]. 
Some other FAD2 enzymes have bifunctional hydroxy- 
lase/desaturase or even tri-functional activities [43,44]. A 
close homologue to FaFADI in peach, PpFADlB-6, has 
been proposed to be involved in lactone production in 
fruits [14]. This enzyme inserts a double bond between 
carbon 12 and 13 of monounsaturated oleic acid to gen- 
erate polyunsaturated linoleic acid, but do not have any 
detectable hydroxylase activity. However, FaFADI is 
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phylogenetically located in a different clade and more 
closely related to the castor bean hydroxylase RcFAH12 
[35]. Seven amino acid residues that differ between ole- 
ate desaturases and hydroxylases have been identified 
and the substitutions of alanine 148 and methionine 324 
of the Arabidopsis AtFAD2 by isoleucines, as found in 
RcFAH12 or Lesquerella fendleri hydroxylase/desaturase 
(LfFAH12), caused a substantial shift in catalytic activity 
[45,46]. Interestingly, these two isoleucines are con- 
served between FaFADl and hydroxylases, suggesting 
that the strawberry gene could encode for a bifunctional 
enzyme (Figure 2). 

The expression profiling of FaFADl in different tissues 
showed that the gene is highly expressed and specific of 
red fruit of lines with high y-decalactone content. There- 
fore, the expression is highly correlated with y-decalactone 
biosynthesis, which occurs at the late stages of fruit ripen- 
ing [4]. In addition, the correlation of FaFADl expression 
with y-decalactone content in the mapping population, 
the coincident map position between y-decalactone and 
FaFADl and the predicted enzymatic activity of FaFADl 
protein indicate that this gene is responsible for the nat- 
ural variation of this VOC in strawberry. Furthermore, it 
can be stated that the absence or extremely low levels of 
y-decalactone in fruits of half of the population lines is a 
consequence of the absence or extremely low levels of 
FaFADl expression in these lines. The same FaFADl allele 
was detected in both bulked pools either using the refer- 
ence genome to map the reads or after de novo assembly. 
The differential expression of FaFADl observed between 
both pools was alike using both methods (Additional 
file 1: Table S6) and was also validated by qRT-PCR 
(Additional file 1: Figure S2; Table S6). However, when 
the progeny lines were analyzed independently, FaFADl 
expression was not detected by qRT-PCR in fruits with- 
out y-decalactone. Furthermore, different FaFADl pri- 
mer pairs failed to amplify in genomic DNA of these 
lines (Additional file 1: Figure S3; see also companion 
manuscript), suggesting that the FaFADl gene may not 
be present in their genome. Taking these results to- 
gether, the most plausible explanation is that the no 
y-decalactone pools had some contamination during 
processing with some fruits containing the volatile. 

Two other candidate genes were studied on their po- 
tential contribution to y-decalactone production based 
on their increased expression in the high y-decalactone 
pool and the annotated enzymatic activity. While FaCYPl 
was not associated to y-decalactone content, the gene 
FaFAHl was up-regulated during fruit ripening. Our 
eQTL analysis of FaFADl and FaFAHl indicate that both 
are associated with y-decalactone. While the association 
of FaFADl expression with y-decalactone is complete, 
FaFAHl only shows a high association with y-decalactone. 
When one eQTL maps in the same genetic location as the 



gene whose transcript is being measured, as it is the case 
for FaFADl, is generally caused by cis-acting regulatory 
polymorphisms in the gene (cis-eQTL). Most probably 
through a polymorphism in the promoter region, which in 
turn gives rise to differential expression. In contrast, eQTL 
that do not map to the location of the gene being assayed, 
such as for FaFAHl, most likely represent trans-acting 
regulators (trans-eQTL) that may control the expression 
of a number of genes elsewhere in the genome [37]. Based 
in the predicted function of FaFADl and FaFAHl, we 
propose that the pathway for y-decalactone biosynthesis 
in fruits proceeds through hydration of unsaturated fatty 
acids. In this proposed model, the enzyme FaFADl would 
catalyze the conversion of oleic acid (18:1) to linoleic acid 
(18:2) by the introduction of a double bond at the A12 
position, as performed by other FAD2 enzymes. Addition- 
ally, FaFADl may possess hydroxylase activity, catalyzing 
the hydroxylation of oleic acid to ricinoleic acid. The fact 
that an eQTL for FaFAHl expression was detected at the 
position where FaFADl maps suggests that FaFADl, or 
most likely the product of FaFADl activity (i.e. linoleic 
acid), up-regulates the expression of FaFAHl, which may 
encode for the enzyme catalyzing the next reaction in the 
biosynthetic pathway. This reaction most probably is a 
hydroxylation although some CYP related enzymes have 
been shown to have epoxidase activity [47]. Ricinoleic 
acid derivative is then shortened by four p-oxidation 
cycles to form the corresponding 4-hydroxy acid. The 
last step in y-decalactone biosynthesis involves the cycla- 
tion of the molecule either by an enzyme with alcohol 
acyl-transferase activity or by spontaneous lactonisation 
under acid conditions [48]. 

Conclusions 

Understanding the basis of volatile organic compound 
(VOC) biosynthesis and regulation is of utmost import- 
ance for the genetic improvement of fruit flavor. This 
study provides genetic and molecular data on how the 
content of y-decalactone is naturally controlled in straw- 
berry and highlights enzymatic activities necessary for 
the formation of this VOC in fruits, y-decalactone has 
been shown to be a sensory important VOC for straw- 
berry flavor [17,49]. However, other important functions 
of volatiles are to defend plants against pathogens, to 
attract pollinators, seed dispersers, and other beneficial 
animals and microorganisms, and to serve as signals in 
plant-plant interaction [50]. GO enrichment analysis for 
the genes up-regulated in fruits without y-decalactone 
detected a significant enrichment in GO categories related 
to response to pathogens. One plausible explanation is 
that this lactone could have anti-pathogen activity and, in 
its absence, up-regulation of other mechanisms of biotic 
stress responses would compensate the lack of this VOC. 
In this context, y-decalactone has been shown to be toxic 
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to yeast and bacteria through its capacity for permeabiliz- 
ing membranes [48]. These data suggest that this VOC 
might have a function in this process, a possibility that 
deserves further investigation. 
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